D0CQ9BRT BBSOHS 



8D 186 4«3 

AOTHOR 
TITLE 

POB DATE 
"NOTE 



EDR^ PRICE 
DESCRIPTORS 



ABSTRACT 



BOO 060 



Zane, Tljomas: rfursh, Daniel ' "' > 
Verification cf Reliability and Validity' of a 
Behavior Rating Scale. • ' . ^ * 

Sep .79] ' ' 

21p,:; paper presented- at the Annual MfeetiDg of the 
American. Psydsholoaica 1 Associatidli (97th, New ifork., 
NY, September 1-5, 1979). '. ^ ,^ , - 

MF0VPCO1 Plus Postage. ■ ; . 

.♦Bel^avior Rating Scales; *Check Lists; *Parent Child 
Relationship: *Test Reliability: Test Validity 

A procedure, was devised to assess the degree of 
reliabilit,y and validity-.of a^ behavior rating scale checklist used to 
evaluate parents* tradninq sk'ills nith handicapped infants. 
Reliability vafe tested bv independent observers viewing -videotapes of 
training sessions and filling cut checklist ratings of the parents' 
behaviors.' Validity was asseS^sed by observers viewing videotapes of 
training ses^sions, recording each training behavior exhibited by the 
parent as correct or incorrect, and/then comparing these results to 
results of checklists ratings for the same training session. The i 
degree of reliability was consistently high for. one of two ob^SejpvVr 
pairs, -and the degree of validity was high for both pairs ,^ in that 
the results obtained by the checklist corresponded with the results 
obtained by the detailed frequency counts. The re suit s^i^^dica ted that 
the chejrklist seqgied to be easv to use as well as being an accurate 
assessment device. (Author/CTM) 




**************************************** 

* Reproductions supplied by EDRS are the best that can be (made * 

* . froa the original document. * 
**************************************** 



ERIC 



U$ DEPARTMiNTpF HEALTH. 
EDUCATIOIift WELFARE 

I . NATIONAL INSTITUTE Of 
-EDUCATION 

THIS DOCUMENT .HAS BEEN REPRO* 
DUCED EXACTLY AS RECEIVED F.ROM 
THE PERSON OR ORGANIZATION ORIGIN- 
ATING IT POINTS OF VIEW OR dPINIONS 
STATED DO NOT NECESSARILY REPRE- 
SS NT OF F IC I AL NA TION^L INST ITU TE OF 
EDUCATtON POSITION pR POLICY 



Verification of Reliability and Validity of a Behaviolr . ' 

Rating Scale / 



Tftomas Zdne .. ' 

. Department of Psychology 
University of Massachusetts 
Tobin^Hall \ 
Amheifst, Ma/ 01003 ^ 

^ Dr. ©aniel Hursh^ * 
Educational ^p^hology Departnjent 
/ V/est VirgiAia University 
Mor|^antown, W.V^ 26506 



• PERMISSION TO REPRODUCE THIS 
MATERIAL HAS BEEN GRANTED BY 



TO THE EDUCATTQNAL RESOURCES 
INFORMATION CE^JTER {ERI€l^ 



1^ 



• 



ERIC 



I. 



yerif icaJcioft .5f Rgliabili'ty and -Validity of a Behavior 
' - • bating Scale 



A procedure was devised to'as.s^s^ the degr*ee o'f reliability and validity 
of a behavior rating scale checklist used to evaluate parents' training 
skills • Reliability, ♦was tested by independent observers viewing video-- 
t^es of training sessions and filling out checklist ratings of the ^ 
parents' behvaiors* Validity was -assesse^i by observers viewing video- 

tui^es of training sessions recording eajph training behavior exhibited - 

, ' * then 

by the parent as correct or incorrect, and/««i comparing these results ^ 

to 'results, of checklist ratings for the same training session, -The degree* 

of reliabiLity was consistently high , for one x>f two observer pairs, and ^* 

the degree of validity was^high for both pairs /'in that the results 

• • •* . 

obtained by the ^ checklist corresponded with t^e results obtained by the 

' * > 

detailed frecj^iency counts. The results indicated that tthe checklist 
seemed to be eafey .to use as well as being an accurate assessment ^device . 



\ 



f 



^ 4 



4 



Verification of Reliability and Validit;^ of a feehavior J^ating Scale * 

^ Many ^(^if ferent systems for recording ^hum^n behavior have been de- 
vised. These observational instruments range from simple rating scales, 
which typically c^wsist of, generally defined categorizes of b'ehaviors, 
to detailed obsemrational systems, which require exact counting of all ^ 
responses at di/ferent intervals • The advantages of the rating scale 
format? are that' this type of instrument usually is not difficult for 
starff to learn to use, and it does not require a great deal of time to 
implement .•.-On the other hand, a potentially serifcus disadvantage with 



tially serifcus disadvantage 

^ V. 

eness of the scale,' results 



rating scales is that due to the vagueness of the scale,' results ob- 
tained may not be reliable and valid, necessary characteristics of 
any recording •systems If a checklist scale of behaviors could meet relia- 
bility ar\d validity standards, then it would be an observational instru- 
ment of simple, design, easily .used , and which one *)uld have confidence 
in the re'sults, .z' 

A cheokli^J^rating scaW was created to assess the quality of tj^aining 
exhibited by parents of handicafiped infants (copies of the checklist and 



instructions for use may be obtained by contacting the authors) . This 
checklist allowed assessment of parents' training abilities in six areas: 
instruction, prompt, non-word cue, reinforcement, model, and ignoring* Each 
of these training skill categories were specifically^def ined. Staff used the 

'Checklist in the following manner. As a parent worked oft'a^task with the ^ * 

■ " \ 

cliild, \th^ staff person observed the training. After training ended, the 
observer immediately filled out the checklist by rating the proportion of 
training behaviors, per category, exhibitec^correctly according to the pre- 
detGrminecl definitions* For example,, the staff peifson determined that out of 
all th€!> times the parent gave instructions, only half of these instances were 



corrfejjtly exhibited as determined by the definition for the use pT instruc- 
tions. Thus, the staff person marked "half" on the rating scale. , 

Once this checklist scale was created, /.it- was necessary to deter- 



mine if it met reriability and validity standards . First , it was necessary 

to determine if ihdependent observers could agree in their scoring of the 

training betiaviors of the s^me parent during the same sessi^ (reliability) 

Also, it was necessary 'to determine if results from any checklist actually 

portrayed what really occurred during that session (validity) . Relia- 

bility was tested by two observers viewing a videotape of a training session 

(involvi^ng parent and child) and independently filling out the checklist * 

ratings of the parent's behavior. Rpliability was defined in terms of the 

proportion of checklist categories which the two observers scored exactly 

the same. * Validity was assessed by first having two observers view a 

vide;otap^ of a training . session and record each trailing behavior exhibited^ 

by the parent as either correct or incorrect. This yielded an exact 

frequency count of the^parent's training behavipr. The resrults of this- 

> ^ • 

frequency count were compared tt) the "results of checklist ratings for thp same 
traiiying session. The extent to* which each categgry was scored exactly the 
same way on both the checklist and frequency counts dietermined the degree 
of Validity. * . ' 

\ ^ ' Method 

For reliability and validity assessment two pairs of observers were used, 
hereafter identified as Pairs 1 and 2. Each pair\worked at different timo^ of 

♦ the day. \ ^ 

Videotapes of 50 different training activities involving a parent working 
with the child were used for assessment purposes* Confidentiality" of clients 
was protected by (a) obtaining parent's permission to make videotapes, (b) never 
identifying parent or child by name tq-the observers, and (c) notifying the 
'Observers that their work— ♦ 



was to" remain cpnf Ideivtlal . Each act ivity was approximately 3^5 minutes in ' 
length, the activities^ were viewed by the observers in a random order^ 

Rellabl Itty assessment Reliability was coWputed b.y dividing the number of 

.' ' * 

categories that the two observers rated the same^ by the total number -of 

\ . ' CI 

*■ < I 

categories rated, multiplied by 100. Reliability was always computed using 
the *lnltial -ratings for a training activity, before observers discussed 
the ratings. - * ^ 

' / • - ' " ' S 

The observers were taught to use the checklist by instructions '\iarfd J 

practice, in the following order: ^ „ . 

I. The |iistory, rationale, and purpose were explained; 

^ 2, The definitions of each category were reviey/ed and explained; 

Each observer completed the^ "example lesson //2'' sheet; . / 

. .4. A "training tape" (videotape consisting of parents working with their 
children, illustrating different types of training behaviors) was 
viewed and the obervers reonarded which behaviors ^^re exhibited; - 

5.. The procedure for completing the actual ratings on the checklist 
t was* discussed ; 

6. The first taped activity was then rated by each pair of observers. ^ 
First, the definitions of one of the categories of that activity 
were discussed- Then, the activity was , plgtyed one time, and when it 
ended the observers independently rated only that category. Then the 
definitions of a second category .were .disjcussed and the .activity - 
played again. When it finished the observers independently rate^ this 
second category. This procedure of evaluating one category at a time 
continued for all categories. After all were rated', the reliability 

of dgreement was computed. A discussion followed concerning the rating 
and any disagreements. Th^ same acti;^lty was viewed in the manner 
described abpve until 80-100% reliability was obtained between the 
|;vo observers .on each category^ . , 

7. For the next two activities, observers rated twci categories at one 
time. Thus, the definitions of two^categories were discussed and 
then the tape of that activity was played one time. Observers In- 
dependently rated the two .categories . Then the definitions of " the 

, next two categories were discussed, the tape played^once ^gain, and 
independent ratings made. This procedure continued until all of 
the categories were rated. Reliability of agreement was .computed . 



The same activity was viewed in the manner jiescribed ajjove until 
80-100% reliability was obtained between the two observers for each 
category. . ^ - 

8. Begirming with the fourth activity, the observers were required to 
' rate'Mll categories from *just one viewing^of the activity. The 
deftnit ions- for each category ♦were first discussed » Then the tape 
was played one time, after which the?, observers independently rated 
each c'ategory. The reliability was computed ^fnd a discussion of ^ny 
discrepancies followed. If the initial reliability was less than 
80->lOO^ for any category, the activity was played again until it 
met ' that criterion. - ^ 

ReliabJLlity assessment .continued in. this manner until the rate of 

• • • . 

percent agreement consisted of ^ stable level or downward (negative) trend, 

over five consecutive activities viewed during one 60 miniate work ^session . 

At^this point validity assessment began* v ' ' / 

Validity assessment Validity wa^^-^termine^^ by the agreement^etweeri 

ratings of a checklist and the frequency of" behaviors actually exhibited 

by parents ^^ring a session. To assess validity the pbservers fi^st learned 

to compute an exact frequency count ^6f each correct and incorrect training 

behavior exhibited by the jiarent. - 

When comp|Ut ing 'the frequency count, the tape of an activity was re- 
« 

played as of ten' as either observer requested . -Observers used stop watches 
to time durations of ' and intervals between behaviors. The observers marked 
each instance -ef a training behavior as either correct or incorrect, 

( • ■ ■ ■ 

according to -the definitions of the categories. In this phase as well, 

• ♦ 

reliability of agreement between scores obtait*^d by these frequency cqunts 



of the two observers was computed. Reliability scores we||B computed usin^ 



ERIC 



the initial counts of each observer. Any category with la percent re^iabillby 
of SO-100% from the initial viewing w^s considered relj^ably measured. How- 
ever, the observers were required to wa%ch the activity again ariH do another 
frequency count of behaviors withtp any category that did not yield this percent 



reliability agreement. . " 

Once reliability in each category was 80-100%, each observer scored 
the results of the frequency county In terms of jthe Cliecklist - ratings . 
For example, if by the frequency count it was shown that' 57% of all ^ 
Instructions were us^d correctly an observer rated the Instruction cate-* , 
gory 'as "half" (34-67%; . . 

The observers were taught to perform tlie frequency count evaluation 
by instruction and practice in the following order:. , ' 

1. The rationale and purpose was explained; - 

2. The frequency count data form wa3 explained; 

3* The frequency count of tfh^ first activity consisted of the two 
observers viewijng the tape and discussing andj recording what was 
observed. In other words, the observers did fiot record independently • 

• 4. Once the observers agreed on the recording of the first activity, 

the same, tape was played again. Thi*^ time the observers Independently 
counted the behaviors. Aft^r the activity was completed, the .relia- 
bility of agreement between the observers was computed. A. discussion 
followed concerning anv diiyferences in the frequency counts. The 
same activity was used until there was 80-100% reliab:^jlity for 
each category. 



Beginning with the next^ activity the observers Indepehdently computed 
a frequency count of parent behaviors. Only discussion of the <;jrefini- 
, tions of the categories was allowed^ b^forei the cdmputin« began. .Re- 
liability was computed for the initial scoring. If any cate^jory 
yielded less than 80-100% reliability, th^ tape was played again 
and frequency counts made until the ip^rcent ac^^ment reached this , 
criterioti. ^ 

^ 

Us^ of the, frequency counts continued In this manner. ,The actlvites" 

used for viewlrtg were those activities on which the other pair of ob-' 

"^1 .. ' ' ' • ■ , ■ * • 

server^ had obtained a 100% reliability with their checklist ratings. 

•"■ • • •■ • ' - "V 

/ To det^ifmine the actual validity assessment, the ratings obtained 



ERJC. 



by the frequency count done by one pair of observers Were, compared to 
the ratings "obtained by the -checklists done by the *other pair of ob-\ 
servers,' al^ on t\\^ same activity. The percent agreement between^the^ 



two Instrument ratings, was computed by Hiyidlng the numher of categories 

i 

that had the same ratings on both instruments, divided by the total number 
of categories rated, multiplied b^ 100 ♦ ^ 

V 

' Results ' ' , 

Reliability Assessment f - ' 

Pair 1 requir<ed approximately 5 hours of instruction ari(^ practice 
before they began evaluating activities using the checklist, rating, all ^^"^ 
categories at once, Th^ data concerning the percent of reliability of 
Pair 1 are found in Figure 1. The percent agreement using the checklist 
ranged from i3'^100%,* with a mean of 58 ♦4% and a median of 30»0%» The rate ^ 

i . ■ , 

of agreement stabilized in a dovmward (negative) trend a^ter 17 activities. 

Pair, 2 began evaluating activities using the^jetr^klist and rating 
air Categories at once after approximately 4 hours of iyistruction and 
practice- The data concerning the percent reliability agreement, of thJLs 
pair (5f observers' are found in Figure 2. The. percent agreement usinj? the 
checklist ranged from 0-100% with a mean of 71.8% and a median of 60 ♦0%. 

The rate of agreement stabilized in a downward (negative) trend after 32 

* » ^ . 

actlvi.ties, , " 

* 

_1 ^ , , [ ■ ^ ' 

• . Insert Figures 1 and 2 about here * 

Validity Assessment 

Pair 1 require^ approximately 2 hours* to learn how to compute the 

"1* . ' 

frequency count and to begin counting behavloije in actiyjflties • Pair 1 used 
Jthe frequency cx)unt on a^total of five activities* The 'reliability between 



these two observer^ ranged from 62-88%, with a mean of 61.0% (see Figure 1) . 
Pair 2 was InstrutteJ on the use of the frequency count for approximately' 
3 hours. Their''rellabil;lty using the frequency count on threfe activities 

ranged from 79-.4%-100% with a mean of 89.7% (see Figure 2). 

- ' ' . ■ /• 

The reliability between the coded activities done by one pair and the 



checklist ratings of the same activities by the other pair ranged from 



80- 



100%, with a mean of 91.3% (Figure 3). 
I* 



Insert Flglire 3 about href 



^ * Discussion • ' \ 

These data seem to ln<|licate that'the checklist was a reliable and 

valid measuring instrumentV The percent Agreement between the two observers 

of Pair 1' failed to consistently fall within -the 86^100% range.. However, 

the scores were close to that lev^, and it As quite probable with further 

ttaining these observers could produce hj-gh reliability. The" other team 

♦ 

observers consistentlj^-^yielded reliab'le scores. 

.•■'*' i • 

* 

Comparing the checklist ratings with the friequenciV count data yielded 
validity within the ar^ceptable 80-100% range. The results of the Checklists 
appeared to accurately reflect: the quality of the parent training behaviors 
being exhibited during the sessions (as defined by the definitions of the 
categories). _ , 

Wlien using* the checklist, of all thie times that two observers' agreed 
oVi^a rating, that ratings was usually either an "all" or "none". In 
other words, most of the time two observers agreed in the checklist ratings 
they were rating either "all" or "none". Even thought this may suggest 
that observers were jusfr gViessing in their ratings and merely checking 
either thi low or high extreme of the rating, this does not seem to be 

■ 19 



* \ ■ 

the case. These same categories wef ^ rajted "all" or "none" -when coinputln>t 
the frequency count data, which confirmed- the accaracy of the' checklist 
ratings. Also, on the fe\<r categories which were reliably rated "few*\ > 
"half", and^"most" by two observers using checklists, the exapt same 
ratings were received, us iTig the freqAjgncy count; 'Tl^UB^ it appears that 
t^e j::hecklist ratings w^re indeed accurate' representation^ of the parents 
training behaviors. . ^ ' ' , 



/ 



ii 



o 

6U 



o 



Figure 2 




o-o-o 




All Categories At One Time 

. o • ■ • ' 



o-o 





o-o 



o 



o-o o-o 



4 5 



8 



9 10 11 12 1*3 14 



4 



15 16 

Activities 



17 IB 19 20 21 22 25*^-^4 :?5 



27 



I 

t 

t. 



I 



31 



12 31 



erIc 



15 



^ - - CHECKLIST . • , 

I 

Fill in >each box for each tBchnique. There are 3 measures: (1) Frequency 
of technique being used; (2) Quality Sf the, use of 6ach technique based 
on atta^ched definitions, and '(3) Additional problems observed in the use " 
of the technique » Note iv Whenever "non^given" is checked go to next 
techni-que.. ' ^ • ^ , . 

A. INSTRUCTIONS: None given, none needed None give, h^t should 

1 ^ ^ave ' . ' ■ ■ 

• Freljuenfiy: Was this -technique- used enough? ^ . 

'"^ too fe w '■ \ ^ just eAoug h ^ tao sinany 

Quality: When this technique^ was observed, was it used appropriate 
according to definitions? 

' . \ ■ ■ ^ ^ . ^ 

None (0%) Few (l-33%)_^ Half (34'-67%) ^« 

Most (68-99%)' All (100%) - 



^1 

A4ditiopal Problems: 

talks too slow ' talks too fast_^ talks too soft^ 

talks too lo^lfc^ otiher ' ■ 



B. PROMPTS : Jlbne given, none pe.ede d . NOne ' given but should 

have 

5'requency: Was technique u§ed enought? * " * 

^oo fe w ^ just enough . too many^ 



Quality: When this technique was observed, was it used appropriate 
according to definitions? ' ^ ^ 



'.None (0%»)^ ' Few (1-33%) / Half (34'-67%>f- 



Most (68-99%) aA (-100%) 

Additional Problems: ^ < 

Stopst" prompt when .child resist s Too slow to give prompt_ 

Too fast to give prompt OtherJ ' . 



1 



✓ 



I 



Family and Infant lieArning Program 



Chiia ' , • . , . MTE^ 



parent Trainer ^ Homfe Trainer_ 

Activity . 



\ The following definitions of appropriate teaching techniques shoiJld be read and 
comjii^ted before observing a trainipg* session. Definitions may modified or 
^ eliminated According to individual parent-child needs (to' modify fill in space 
pr<5vided; to eliminate, draw a line thl^ough undesired part of definition). 

^ Instruction: (X) verbalizations that specify or cue target response (example: 
~ ' '^^ T~ "^ick up the b^ll", sweetie, over he»e")^ >* 

V ' ^ ^ - ^2) ru^raote tha n / cou^ecafcive instructions; 

^ « (3) modifications ' 



ERLC 



Prpmpt^: - (h) full or partial (circle 1 br^2 ) physical guidtti^ce to perform 

target rasponse Cexample: grasping child's hand and putting it on 
. * the ball V V 

(2) must occur within 3 seconds^ of an instruction if child doesn't do 

(3) modifications . 



Non-Word (1) motioning, gesturing, non-word sounds. that cue target re^onse 

Cue : (example: making a "come here" gesture with Hands to cue the child 

7 * to crawl toward parent) ; 

(2) each must be no more than ^seconds durations; 

(3) must occur within 3 seconds of an instruction if child doesn't do 

(4) modifications <^,. [ 



Reinforcement : (1) positive- comments, gestures^ tone of voice, physical 

for target response (example: parent hugging child and says 
' ^ "you did it!"; . • 

(2) must occur within 2 seconds of target response; 

(3) mu&t occur for ^seconds; 

(4) modifications - " 

Models: (1)*" performing'^ the target response so child can imitate (example: 

parent eloping hands when the target response is clapping hands) 

(2) no more than ^consecutive models; 

(3) must occur within 3 seconds of an instruction if child doesn't do 

(4) modifications . ; 

■ '■ - — '■■ ■ ■ - ' ^-r — — - — - — ^I^ 

* . 

Ignaring: (1) removing all physical and social contact (example: parent 

turning head away from -child while child is behaving 
inappropriately) ; , . 

(2) must never occur after target or appropriate response; 

(3) mu9t occur within 3"seconds of the "inappropriate" behavior; 

(4) modification s 



The Second Pause Rule: Those techniques which are|ifiterrupted by a one-second 
pause and then repeated again are considered as onl technique per pause, 
Exaipple: for Instructions: "Come over here^', one second pause, "Come here**« 
instructions) ^ 

IS 



f 



C» NON-WORD CUES: None given, none needed Nope given, but - 

should have ^ 

Frequency:* Was this technique used enough? 

too few . ju8t enough 'too many 



Quality: When this technique was observed,, was It used appropriately? 

None (0%) Few Cl-33%) . Half (34-67%) 

I 

Most /<68-9?%) ^ All (l00%)jL \ 

\ ^ J ' ' ) • 

Additional. PreMems : " . . - j . - 

■ ■ ^ ' V ^ ' ^ • > / 

> too distractin g finclear wrong kjn d^ \ ^ 

other " . 

— — H — ■■ - ■■ ■ » ■ . ■ , , , 

. '* 4 ■ ■ 

D. MODELS: None given; none needed None given, but should have 

Frequency: 'Was this technique used enough? 



too fe w just enough too many 



Quality: When this technique was observed, was it used appropriately? 

None (0%) Few (1-33%) Half (34-67%) 

Most (68-99%) All (100%)j 

Additional Problems: * 

N too slow ' too fas t child not watching 

too long Othe r ' ^ ^ ' 



E. REINFORCEMENT: Nope given, no correct child behavior occured_^ 

'^one given, but should hav e 

Frequency: Was this technique (ised enough? 

too fe w just enough too many 



Quality: When this tedbniquie was observed, was it used ap^oprlately? 
A, Ndjie, (0%) Few (1-33%) ;* ' Half <34-67%) . . ' 



Most (63-99%) ^ All (100^) 



19 



, Summary ^o1^ teaching: • 

A. W^e SmiL STEPS rewarded' appropriately? 

None given, no correct child behavior occurred 

'^8 



No_ 



B.' Use of teaching techniques: 



3. 



1. instructions 

2. Propipts^; 
Ion-Word Cues 

4. ' Models 

5. Reinforcement 
Ignoring 




OK 



Needs Work^ 



Frequency 



Quality 



" 20 



Additional Problenvs: ' *• • 

Short Duration LOng Duration Unexcited ]^fe liver y 

None or little social prais e Given after inappropriate v 
^ behavior ^ > % Other ^ ^ ^ ' - 



IGNORING: None given, no inapprj^ipriate child behavior occurred 



None given, but) should have . 

Freayency: Was this wH^niOue used enouja;h? 

^Tdo few ' Just enough ^ Tqo^any 



Quality: When this technique was observed, was it ueed appropriately? 

r 

None (0%)- Few (1-33%) ' Half (34-67%) 

(68-99%) All. (100%) ^ I 

Additional Problems/' v \ O 

Too long Too short Continued Attention to Child^ 

Other 



